Introduction

After what was analyzed in the lectures, I’m interested to see if we can visualize a high level of comorbidity between smoking, exercising and COVID-19 within the dataset. Before I try to put a model in place, I need to find external datasets about smoking and physical exercise such that I may be able to make any further analysis. I will attempt to relate the datasets through the country column.

Data

COVID-19

The COVID-19 data for this report consists of 2 CSVs that you can find here.

Each one represents the confirmed cases and deaths worldwide.

Confirmed Cases
Confirmed Deaths

Tobacco Atlas

For cigarette consumption I will use the dataset avaialable throught theTobacco Atlas available here.

Fields

  1. Country
  2. Average daily number of cigarettes consumed per adult (15+ yr) smoker, 2019

Ipsos Global Advisor

Global Views on Exercise and Team Sports

For the exercise information I will use the dataset available here.

Fields

  1. Country
  2. Mean Number of Hours Physical Excercise Per Week

United Nations

Department of Economic and Social Affairs, World Population Prospects 2022

For the age information I will use the dataset available here.

Fields

  1. Country
  2. Median Age

Table Joins

Now that we have the data loaded, let’s join all of the different tables by country and year

Mortality

Let’s graph the mortality rate before the Vaccine came out (August 2021).

Models

This is the model summary for cigarette consumption

## 
## Call:
## lm(formula = deaths ~ consumption, data = covid_stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -162110  -93661  -20662   38908  531415 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 194815.73   31870.81   6.113 5.64e-08 ***
## consumption   -113.89      33.39  -3.410   0.0011 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 130500 on 67 degrees of freedom
## Multiple R-squared:  0.1479, Adjusted R-squared:  0.1352 
## F-statistic: 11.63 on 1 and 67 DF,  p-value: 0.001103

This is the model summary for weekly excercise mean

## 
## Call:
## lm(formula = deaths ~ weekly_excercise_hours_mean, data = covid_stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -124917  -80723  -33363   12928  557159 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   162385      42162   3.851 0.000265 ***
## weekly_excercise_hours_mean   -10296       6415  -1.605 0.113182    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 138800 on 67 degrees of freedom
## Multiple R-squared:  0.03703,    Adjusted R-squared:  0.02266 
## F-statistic: 2.576 on 1 and 67 DF,  p-value: 0.1132

And this is the model summary for the median age

## 
## Call:
## lm(formula = deaths ~ median_age, data = covid_stats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -162708  -72866  -31953   40166  549404 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   402605      94665   4.253 6.69e-05 ***
## median_age     -8124       2508  -3.240  0.00186 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 131500 on 67 degrees of freedom
## Multiple R-squared:  0.1354, Adjusted R-squared:  0.1225 
## F-statistic: 10.49 on 1 and 67 DF,  p-value: 0.001865

Graphing the models

Cigarette Consumption

Weekly Excercise

Age

Bias & Conclusion

Finally, please find the session info below.

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] scales_1.2.1     plotly_4.10.1    lubridate_1.9.0  timechange_0.1.1
##  [5] knitr_1.40       forcats_0.5.2    stringr_1.4.1    dplyr_1.0.10    
##  [9] purrr_0.3.5      readr_2.1.3      tidyr_1.2.1      tibble_3.1.8    
## [13] ggplot2_3.4.0    tidyverse_1.3.2 
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.2.1    digest_0.6.30       utf8_1.2.2         
##  [4] R6_2.5.1            cellranger_1.1.0    backports_1.4.1    
##  [7] reprex_2.0.2        evaluate_0.18       highr_0.9          
## [10] httr_1.4.4          pillar_1.8.1        rlang_1.0.6        
## [13] curl_4.3.3          lazyeval_0.2.2      googlesheets4_1.0.1
## [16] readxl_1.4.1        rstudioapi_0.14     data.table_1.14.4  
## [19] jquerylib_0.1.4     rmarkdown_2.17      labeling_0.4.2     
## [22] googledrive_2.0.0   htmlwidgets_1.5.4   bit_4.0.4          
## [25] munsell_0.5.0       broom_1.0.1         compiler_4.2.1     
## [28] modelr_0.1.9        xfun_0.34           pkgconfig_2.0.3    
## [31] htmltools_0.5.3     tidyselect_1.2.0    viridisLite_0.4.1  
## [34] fansi_1.0.3         crayon_1.5.2        tzdb_0.3.0         
## [37] dbplyr_2.2.1        withr_2.5.0         grid_4.2.1         
## [40] jsonlite_1.8.3      gtable_0.3.1        lifecycle_1.0.3    
## [43] DBI_1.1.3           magrittr_2.0.3      vroom_1.6.0        
## [46] cli_3.4.1           stringi_1.7.8       cachem_1.0.6       
## [49] farver_2.1.1        fs_1.5.2            xml2_1.3.3         
## [52] bslib_0.4.1         ellipsis_0.3.2      generics_0.1.3     
## [55] vctrs_0.5.0         tools_4.2.1         bit64_4.0.5        
## [58] glue_1.6.2          crosstalk_1.2.0     hms_1.1.2          
## [61] parallel_4.2.1      fastmap_1.1.0       yaml_2.3.6         
## [64] colorspace_2.0-3    gargle_1.2.1        rvest_1.0.3        
## [67] haven_2.5.1         sass_0.4.2